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ABSTRACT : ; 

Two recent developments in management information 
system technology and higher education administration have brought 
about the need for this study, designed to develor a methodology for 
revealing a relational model of the data base that administrators are. 
operating from currently or would like to be able to-operate from in 
the future. Administrations af higher education have heen forced to 
rely more heavily on information systems to respond to the demands 
for accountability and allocations of limited resources. Information 
systems technclogy through the advent of data base management systems 
is able to be more responsive to administrative information needs, 
provided the relationships within the data required by administrators 
is known. The analysis, conducted at the University of Minnesota, 


‘consisted of testing several data grouping,techniques including four 


hierarchical clustering methods, factor analysis, and observation of 
summary matrices on the data. Complete linkage and average linkage 
cluster analysis provided what appeared to be the most reliable 
groupings of the entities and were applied to the data. The 

net hot logy does reveal the relationships that ‘respondents perceive 
to be in.the data. The méthodology as it was tested was effective as 
an aid to the data base designer in establishing a relational model 
of the data base. (Author/JAD) 
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a - Abstract ss * 
Two recent developments in management information system techaolgty and 
io * higher education administration have brought about the need for this study., 
saatiteciattous 6e higher education have been forced ée rely more heavily on 
' information systems to respond to the demands. £or' aceoulitabl tity and allocation 
sof limited resources. Information is bane technology through the advent of 
R database management systems is able to be more responsive to Sdmtndateaelve 
information needs, provided the eelatiadetacs within the data required by .° 
; ‘administrators is known. The problem is to develop a suphodnlege for reveal- . 


ing a relational model of the database which administrators are operating 


from ‘currently or would like to be able to operate from in the future. 


The analysis consisted of testing several data grouping techniques includ- 
ing four hierarchical clustering methods, factor analysis and shaGuambten of 
summary matrices on the data., Complete linkage and average linkage cluster 
analysis seavided what appeared tp be the most reliable groupings of the 
entities. These two clustering techniques were applied to the data. 

The methodology does reveal the relationships which respondénts perceive 

; to be in the data. sThe methodology may or may not have revealed top administrators' 
views of the data, depending on how well saudslenteaelve staff netbers.understand 
their administrator's vies: The methodology as it was tested as effective as 
an aid to the database designer in establishing a ielatlonal Woddl of the data- 


base. 
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A Methodology for Data Structure Assessment ‘ 


/ 
° 4h 


in Higher es Administration 
: Say? { 
/ a, : 
Introduction ah a ee 
ae J, Mt 
with it a whole new field of study - the 


study of data. We now understand from the work of Codd, Martin, Date and others 
. ® @ a 


that data has an inherent structure which can be traversed, manipulated and 


designed using the inherent data structure as a model will probably require 
' léss maintenance over time and be more responsive to current as well as future 
| y 
user needs. / 
. | oh 
However, the ‘data processing industry has not yet developed an efficient 
relational database management system for large, databases (by relational system 
; here it is meant one which records all relationships in the data structure). 
Problem 
The problem then is to develop a method for determining which of the rela- 
tionships in a given data structure will be most frequently used and therefore 
reflected in the efficiently designed data base. This paper disucsses a metho- 
dology for determining the relational needs of the top administration of a large 


university. 


Objectives 


The objectives of this study were to test the effectiveness of a variety 


of clustering techniques, with associated data collection procedures, for: 
1. discovering the latent data élement structures in the minds of 
higher education administrators based on their decision-making 
and information needs, and 


2. communicating these structures to technical experts in the design 
of management information systems and data bases. 


“ Beenewe in these dbjectives would permit the optimization.of relationships among 
data elements in the development of a sage management information system or the \ 
restructuring of an existing system. . ; ak 
| ' Procedure 

The methodology is a multistep process which includes: Data Element collection; 
normalization to identify entities; entity geouping by administrative users; , 
and analysis of the groupings. The methodology was tested at the University of 
pianewots using major data files currently in use (for a more detailed discussion 
see Baltes, 1977. 

t 
Data Element Collection . ; : } 

The relative stability of -& mature educational institution with well estab- 
lished conventional data systems led to the assumption that the data Elements 
reauired in the database system will be, for the most part, Susnesais which are 
-already being sevlecved death current data processing systems. Fifteen master. «+ 
files were identified by the systems analysts in the data processing department 
which contained most of the data elements used in current systems. Although there 
are some 400 files currently in use, those not included in this study are wither | 
a subset of one of the PAN Cee teleS selected; a different ScaeON UE sas of 


the files selected; similar to one of the fifteen selected but in a different 


sequence ; ora data file from a systen not considered to be an integral part of 


6 | - 
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the University's administrative systems, such as the printing plant -inyentory 


4) 


file of the telephone billing file. 


| | > 
These fifteen files contained a total of approximately eleven hundred data ; 

items. The names and descriptions of the elements were stored through) a” compu- 

terized data dictionary for ease of access and manipulation. The data) items 


| : 
ae also described on a deck of cards which was used in the normalization: 


process described in the next ‘section. ; | 


Normalization | 


E.F. Codd defined the normalization technique as 4 method. for g ouptnn\ tags 
items into a set of relations, producing a relational del. The noxymalization 
technique was used with the data items collected in thé previous step in order 


to group the items into a manageable set of entities to be used in the survey } 


described below. 
The process of normalizing the eleven hundred items included the following 
steps: , 


(1) Data items whose function was primarily data procéssing related 
and contained no information for the user were eliminated. 
Examples of this type of data item include record codes, trans- 
action codes, file identification items and update codes. 
Approximately three hundred of these types of data items were 
eliminated from’the original set of eleven hundred. 


(2) Systems analysts who were thoroughly familiar with the meanings 
and usage of data items were asked to describe the relationships v 
between data items. 


(3) The items were viewed as one relation of eight hundred domains 
using Codd's definition. Then the normalizing steps were followed, 

' the complex relation was, in effect, broken down into thirty-nine 
simple relations written in third normal form. Each simple relation 
was given a name based on the real world entity which its data. 
elements described. For example, the relation containing the data 
items which describe employees was called the employee entity. 

Each of the simple relations contains data items which describe 

attributes of an entity which is either: Physical, (for example, 

student, building), administrative, (for example, registration, 

account), or organizational, (for example, parent department). 

The thirty-nine entities yielded by the’ normalization process are . * \ 
as- follows: ae | ; 


al | 


i | 


1. Advanced Standing Academic Record 
r 2.. Account . :; 
a 3. Accounting Balance 
, 4. “Alumni Membership 
5. Applicant test Scores 
6. Application 


,% 7. Application Control — 
8. Appointment 
. 9. Authorization 


10. Building ' 
11. Business Address 
12. Campus Address 
13. Class Schedule 
- 14. Course 
_ 15. Course Section 
16. Courses Registered For 
17. - Current Month\ Accounting Transaction = ° 
18. Deduction History ; 
19. Deductions 
20. Degree Awarded 
21. Donor Entity ay « - = 
22. Employee 
23. Employee Class 
: 24. Employee Tax Status 
25. Expense Class ° 
26. Home Address 
27. Insurance and Retirement ' 
28. Non-College Academic Record , 
29. Parent Department : yo . 


' 30. Position Control 4 fs 
31. Promotional History es 
32. Registration 
33. Room 
34. Salary History , . 
35. Skill and Education © | 9 
36. Student/Applicant 


37. Student Financial Data. 
38. Studedtt's ‘Family Data ‘ 
39. Year-to-date Payment 


Further information on the normalization|process may be found in the texts 
: | 


already mentioned by Codd, Martin and Date. | 


UI 


Entity Grouping : 


The subjects fior the entity grouping survey were selecte through the process 
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: : a 
the, instructions which subjects were wien are shown below. The oe yielded 
r ~\ a number be entity groupings which the subjects believed would be useful to 
\ them, . Bg | , _ 
\ 
“4 qs Seventy persons were identified by the Veee Presidents of the University 
to be interviewed ina Longe -range planning effort plie carried out by the date 
. processing department concurrently with this study. During the interviewing 
process, several. interviewees indicated’ that they do not.currently use ieee 
V produced. by the ante processing department, nor did they see any future use for 
such inaetiution-wide data. An example of this kind of situation is the Director ‘ 
of the: University Press, who essentially is running a small hudtnena whieh he 


manages primarily from data generated within his own department. The original 


seventy interviewees were reduced to fifty-four subjects for this survey. The 


' ° ¢ 


group was made up of: all vice presidents, provosts, vice-provosts and deans; 
selected persons with primarily planning responsibilities and certain administra-_ 
tive staff persons. 
The vice presidents originally ‘8lected this group of people as being 
\ representatives ge their respective ‘areas who ought to have input *o the data 
’. processing department's long-range plan. It was’ EnEEETOre believed that they 
A186 would have the BEEHEeeS needs for data from a future administrative nates 
base and therefore eis to have input to its desten® through this pabieys 
The survey was conducted by mailing to each subject a packet containing 
a deck af thictycnine cards containing descriptions of each entity, survey 
instructions:.and tesponse ferns eo be returned to the researcher. The instruc- 
tons wate as follows: | 


Step 1 Read all the steps of the process. 


Step 2 Take a few minutes to identify the major decisichf areas in 
‘your administration in which data from the University's 7e 4 


ee 


détabase, either in summry or detail _form, would be 
helpful. : - 


Step 3 Describe each decision area on a separate decision form. 
‘Five forms have been provided for this purpose. You may 
‘select’as many decision areas as you wish up five. 


Step 4 Indicate the number of times each year that you would need 
: updated reports from the computer for this deci$ion area 
in the space called "frequency of access". 


1 
Step 5 Look at each entity card and decide if the attributes of 
7 - * that entity would be useful data for the first decision 
f+ ' area. ' ” 


Step 6 For eath entity which would be helpful for. the first decision 
' . area, write the entity number on ‘the decision form in the 
~ spaces called "Applicable Entity Number§." 


r 


Step 7 Repeat steps 4 through 6 for each decision area. 


Step 8 Return the decision, forms, 


Analysis ‘of Administrdtor Grouping ae eae ne Ot 


A number. of analysis uc hanedee were tested including: observation of mattices 
showing the number of ‘times each entity was grouped with ath other entity; fab ~ 
tor analysis of the groupings; and several cluster analysis techniques. ,° 

The niatrices provéd to be too latge and complex to etait useful analysis 


by observation, Factor analysis of the groupings showed no useful clusters of 


: entitied and was therefore ruled out as an analysis method. Be fore“-discussing 


z é 
the results from cluster analysis methods tested, a brief discussion of -cluster 
‘- : | 
‘analysis may be useful. roy ' , 
Blashfield suggests that there are probably one hundred different clustering ( 


techniques available today, each described in terms unique to the field in which 
the method originated (Blashfield, 1976). Each clustering technique has character- 
istics’ which “may cause it be be more ér less useful for clustering a given set of 


data. The impact of the characteristics of a given method on a given set of ‘data - 


» 
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cannot always be predicted so that results of a given cluster analysis must be 
1 


y 


tested for their validity. Further,~Blashfield recommends that rather tian select 
. = ¢ : 

a particular clustering method the researcher may wish to’ try several methods 

and compare the results before selecting one method. In this way, che: re- 


searcher can analyze the differences between results to decide which méthod 


has characteristics which will be most wwetul for the analysis. 


- > 


_- 
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‘Blashfield describes four clustering methods in his article each of which 
. sae 


was used in the analysis, of the administrator groupings. The differendées be- 


tween the methods are in the way that each links entities into. groups or clusters. 
All four methods are hierarchical clustering methods which give an output format 


suited to the analysis required here. The four methods are: 
\ * . 
Single linkage . Each entity is, linked into the cluster which it is 
most like, or in this case, used with most frequency. 


Complete linkage Each entity links to the cluster containing all 
entities which are more similar to it than are 
all the entities of any other cluster, where 
similarity is measured by the frequency with 
which entities were used SOBREEE. j 


Average linkage Each entity links with the cluster whose average * 
value is most similar to it. 
Minimum Variance This method clusters entities in such ‘a way that the 
or Ward Method variance betwee éntities within a cluster in ming - 
mized. 


Anderberg has written a series of programs which will perform hierarchical 
cluster analysis by seven different methods including those mentioned above 
(Andberg, 1973). In this study all four methods were used, the results of 


which will be described in the Results section. The complete linkage and average 


ii 
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1 oe - -10-.., ac = : 


- " : ‘ ae ‘ = A 4 
’, linkage methods appear to yield tie, most reasonable ‘solution baged on 
fear, : ; . & 5 & 7 
comparison of these solutions to current data file structures. : oe ae 
3 Results 


The data structures as currently used may be approximately divided into 


- seven major groups: employee, facility, address, course, student, financial . 
ee. , ; . - — 
and development. These structures were compared to» the cluster analysis ‘to 

4 


determine the reliability ‘of the results. as well as relationships which gppear 2 
, ait the results but are not in current structures. Although a number of different 
btustenings were computed and each provided additional insight’ to the prabable~ 


a. ’ 
future relational requirements of top administrators, only one clustering 


eo. 


result ‘is presented for discussion here (see Baltes for detailed discussion of 
° ™ . 


the analysis and findings:of the study). 
Figure I below shows one. hieurehacat” clustering of the entity Spoupane? 

The strength of a relationship between two RHEESIES or one ee and a cluster 
may be measured by the point at which they Jeet in the hierarchy ‘tiers entities 
with. strong “relationships eed together, with high frequen¢y) wikl connect on 
the left hand 4 ae while those with weaker relationships will BORNE CE further 

. deeds: the right. For example, administrators grouped employee tax status with 
Yyear-to-datb. payment data very often hile employee tax status was almost caver 
grouped with registration Pre Ultimately, by design, this analysis method 
will Aiuis link eueny entity in the groupings. 

ie Examination of che Hace results shows ‘hat only five separate groupings 
(divided by the dash line) can be identified compared with the seven in ueaene , 
structures, The top cluster compares roughly to the. enployes file. The second 
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cistaves Tax Status 


YTD ‘Payment data 


Deduction 


_ Salary History 


Deduction History 
Position.Control . 
Employee 
Employee Class ~ 


_ Appointment - 


Home address . 
Fringe Benefits 
Promotion History 
Skw11 or Education 


o Expense Class 


Parent Department 
Account Balance 
Account Profile 


_ Authorization ~ 
Accounting Transaction +: 


Course - 

Course Section 
Room 

Business. Address 
Campus Address ; 
Building 

Class Schedule 
Donation 
Non-CoTleqe Acad Rec 
Student Family’. 
Application Control 
Student Financial. 


AppYieant Test Score - 


Student. or Applicant 


Application for Admt 


NAS Academic Record 


Alumni Membership 


Degree Awarded 


Course Registered in 


Registration 
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cluster is iiiaicuaiaes SOmSEREEY to the current financial file. The third 


“a 


‘cluster compares to the current course file. The fourth cluster combines the | 
current address and development files and the fifth relates to the current 
° : ifs 


student file. 


The summary statements drawn from this analysis are; 


(1) Top administrators view alumni data as part of the student file 
rather_than the development file as current structures carry the 
data. Sa G a 
Top administration is not concerned about facilities data except 
as it relates to the course schedule and campus address. 


. Authorizations are an accounting entity related to building con- 
struction and maintenance. Although current structures carry 
authorization data in the accounting system, top administration 
appears to see authorization as strongly related to room. (Current 
‘data processing systems can relate authorization data with rooms: 
only with great SSSR PEUDEY ‘and expense). 


it Atchesighy the above comments relate to one clustering, a asian of thein were 


J’ done. ‘ng each of the clusterings are examined, additional relationships not 


, 


supported by current systems were uncovered - relationships “whitch future data- 
bases must support if they are to be responsive to management's needs. 


S- om id Conclusions . oe \ 


The methodology was proven to be a very useful tool for the database adminis- " 
id ee ; 
trator for anticipating future relational requirements of the databases he designs. 
, Some’ difficulties were Shuointercs in Benes os responses and the ability of some 


Me 


subjects to understand. the instructions. An interview format might have resolved. 


we ‘ ag 5 
- some of these cpebienae ; ge ‘ 


‘ E “ve 


The implications of this study for database designers must be viewed from 


‘the perspective of a total systems design scenario.» The database designer must 


- 


s 


a oa 
know a good deal sane about the data resource than was iecennareve design a 
ecnvanetanel stnsi application ttle He must uffderstand: the logical views 
which ,the Seige must be able to present for all-users; security and privacy 
consttaints for all entities in the databask; ~eulaeatias of the database 
system which will be. used and their impacts on the physical implementation 
of )the databace’ the value of the data as a resource of Bie institution which will 
have implications on what the institution will fener in terms of eetereiee 
of use and frequency of use; and ge ia aspects which I have not mentioned. 
This methodology could be effective not only in building a finelenal model 
of the database reflecting top administratiéA's view ofthe data but for finding 
all logical views that users may require of the database. If prospective 
users were asked to group antiedes not by decision area but by frequency of : 
use or level of security or privacy constraint and so on, the database admini- 


strator could cluster each set of’ responses to get different perspectives of 


“the data, ail of which will need to be taken on éonatdevatton in the final 


Fd 


-dmplementation of the database. 
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